155 research outputs found

    Saturating representation of loop conformational fragments in structure databanks

    Get PDF
    BACKGROUND: Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks. RESULTS: We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments. CONCLUSION: The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space

    PCRPi-DB:a database of computationally annotated hot spots in protein interfaces

    Get PDF
    Protein–protein interactions are central to almost any cellular process. Although typically protein interfaces are large, it is well established that only a relatively small region, the so-called ‘hot spot’, contributes the most to the total binding energy. There is a clear interest in identifying hot spots because of its application in drug discovery and protein design. Presaging Critical Residues in Protein Interfaces Database (PCRPi-DB) is a public repository that archives computationally annotated hot spots in protein complexes for which the 3D structure is known. Hot spots have been annotated using a new and highly accurate computational method developed in the lab. PCRPi-DB is freely available to the scientific community at http://www.bioinsilico.org/PCRPIDB. Besides browsing and querying the contents of the database, extensive documentation and links to relevant on-line resources and contents are available to users. PCRPi-DB is updated on a weekly basis

    Structural characteristics of novel protein folds

    Get PDF
    Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region

    Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams

    Get PDF
    BACKGROUND: Protein binding site prediction by computational means can yield valuable information that complements and guides experimental approaches to determine the structure of protein complexes. Predictions become even more relevant and timely given the current resolution of protein interaction maps, where there is a very large and still expanding gap between the available information on: (i) which proteins interact and (ii) how proteins interact. Proteins interact through exposed residues that present differential physicochemical properties, and these can be exploited to identify protein interfaces. RESULTS: Here we present VORFFIP, a novel method for protein binding site prediction. The method makes use of broad set of heterogeneous data and defined of residue environment, by means of Voronoi Diagrams that are integrated by a two-steps Random Forest ensemble classifier. Four sets of residue features (structural, energy terms, sequence conservation, and crystallographic B-factors) used in different combinations together with three definitions of residue environment (Voronoi Diagrams, sequence sliding window, and Euclidian distance) have been analyzed in order to maximize the performance of the method. CONCLUSIONS: The integration of different forms information such as structural features, energy term, evolutionary conservation and crystallographic B-factors, improves the performance of binding site prediction. Including the information of neighbouring residues also improves the prediction of protein interfaces. Among the different approaches that can be used to define the environment of exposed residues, Voronoi Diagrams provide the most accurate description. Finally, VORFFIP compares favourably to other methods reported in the recent literature

    Presaging critical residues in protein interfaces-web server (PCRPi-W):a web server to chart hot spots in protein interfaces

    Get PDF
    BACKGROUND: It is well established that only a portion of residues that mediate protein-protein interactions (PPIs), the so-called hot spot, contributes the most to the total binding energy, and thus its identification is an important and relevant question that has clear applications in drug discovery and protein design. The experimental identification of hot spots is however a lengthy and costly process, and thus there is an interest in computational tools that can complement and guide experimental efforts. PRINCIPAL FINDINGS: Here, we present Presaging Critical Residues in Protein interfaces-Web server (http://www.bioinsilico.org/PCRPi), a web server that implements a recently described and highly accurate computational tool designed to predict critical residues in protein interfaces: PCRPi. PRCPi depends on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs). CONCLUSIONS: PCRPi-W has been designed to provide an easy and convenient access to the broad scientific community. Predictions are readily available for download or presented in a web page that includes among other information links to relevant files, sequence information, and a Jmol applet to visualize and analyze the predictions in the context of the protein structure

    PCRPi, Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces

    Get PDF
    Protein–protein interactions (PPIs) are ubiquitous in Biology, and thus offer an enormous potential for the discovery of novel therapeutics. Although protein interfaces are large and lack defining physiochemical traits, is well established that only a small portion of interface residues, the so-called hot spot residues, contribute the most to the binding energy of the protein complex. Moreover, recent successes in development of novel drugs aimed at disrupting PPIs rely on targeting such residues. Experimental methods for describing critical residues are lengthy and costly; therefore, there is a need for computational tools that can complement experimental efforts. Here, we describe a new computational approach to predict hot spot residues in protein interfaces. The method, called Presaging Critical Residues in Protein interfaces (PCRPi), depends on the integration of diverse metrics into a unique probabilistic measure by using Bayesian Networks. We have benchmarked our method using a large set of experimentally verified hot spot residues and on a blind prediction on the protein complex formed by HRAS protein and a single domain antibody. Under both scenarios, PCRPi delivered consistent and accurate predictions. Finally, PCRPi is able to handle cases where some of the input data is either missing or not reliable (e.g. evolutionary information)

    Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor

    Get PDF
    BACKGROUND: Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information. RESULTS: We present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor. CONCLUSION: We have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0741-7) contains supplementary material, which is available to authorized users
    corecore